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Abstract — Signature-based botnet detection methods identify 
botnets by recognizing Command and Control (C&C) traffic 
and can be ineffective for botnets that use new and sophisti¬ 
cate mechanisms for such communications. To address these 
limitations, we propose a novel botnet detection method that 
analyzes the social relationships among nodes. The method 
consists of two stages: (i) anomaly detection in an “interaction” 
graph among nodes using large deviations results on the 
degree distribution, and (ii) community detection in a social 
“correlation” graph whose edges connect nodes with highly 
correlated communications. The latter stage uses a refined 
modularity measure and formulates the problem as a non- 
convex optimization problem for which appropriate relaxation 
strategies are developed. We apply our method to real-world 
botnet traffic and compare its performance with other com¬ 
munity detection methods. The results show that our approach 
works effectively and the refined modularity measure improves 
the detection accuracy. 

Index Terms —Network anomaly detection, cyber-security, 
social networks, random graphs, optimization. 

I. Introduction 

A botnet is a network of compromised nodes (bots) 
controlled by a “botmaster.” The most common type is a 
botnet of network computers, which is usually used for 
Distributed Denial-of-Service (DDoS) attacks, click fraud 
and spamming, etc. DDoS attacks comprise packet streams 
from disparate bots, aiming to consume some critical re¬ 
source at the target and to deny the service of the target 
to legitimate clients. In a recent survey, 300 out of 1000 
surveyed businesses have suffered from DDoS attacks and 
65% of the attacks cause up to $10,000 loss per hour [1]. 
Both click fraud and spamming are harmful to web economy. 
Click fraud exhausts the advertisement budgets of businesses 
in pay-per-click services [2], and spamming is popular for 
malicious advertisements as well as manipulation of search 
results [3]. 

Because of the huge loss caused by botnets, detecting 
them in time is very important. Most of the existing bot¬ 
net detection approaches focus on Command and Control 
(C&C) channels required by botmasters to command their 
bots [4], [5]. One mechanism is to filter specific types of 
C&C traffic (e.g., IRC traffic) [6], [7], [8]. Recently, botnets 
have evolved to bypass these detection methods by using 
more sophisticated C&C channels, such as HTTP and P2P 
protocols [9], [2]. P2P botnets like Nugache [10] and Storm 
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worm [9] are much harder to detect and mitigate because 
they are decentralized. In addition, more types of C&C 
channels are emerging; recent research shows that botnets 
start to use Twitter as the C&C channel [11]. It is very 
challenging to identify and monitor these sophisticated C&C 
channels. Furthermore, the switching cost of C&C channels 
is much lower than the monitoring cost, thus botnet can 
bypass detection by changing C&C channels frequently. 

In addition to C&C channels, botnets have some behav¬ 
ioral characteristics. First, bots activities are more correlated 
with each other than normal nodes [12], [8]. Second, bots 
have more interactions with a set of pivotal nodes , including 
targets and botmasters. Compared with C&C traffic, these 
behavioral characteristics are harder to hide. 

In this paper, we propose a novel botnet detection frame¬ 
work based on these behavioral characteristics. Instead of 
focusing on C&C channels, we detect botnets by analyzing 
the social relationships, modeled as graphs of nodes. Two 
types of social graphs are considered: ( i ) Social Interaction 
Graphs (SIGs) in which two nodes are connected if there 
is interaction between them, and (ii) Social Correlation 
Graphs (SCGs) in which two nodes are connected if their 
behaviors are correlated. We apply our method to real-world 
botnet traffic, and the results show that it has high detection 
accuracy. 

II. Method Overview 

We assume the data to be a sequence of interaction 
records ; each record r = (timestamp, idl, id2) contains a 
timestamp and the IDs of the two participants. For botnets 
of network computers, a interaction record corresponds to a 
network packet. 

We group interaction records into windows based on their 
timestamps. For all k' s, we denote by V\4 the collection of 
interaction records in window k and present the definition of 
the Social Interaction Graph (SIG) for window k as follows. 

Definition 1 

(Social Interaction Graph). Let £k be an edge set such that 
(i,j) E £k if there exists at least one interaction record 
r E W/c whose participant IDs are i and j. Then, the SIG 
Qk — (V, £ k ) corresponding to Wk is an undirected graph 
whose vertex set V is the set of all nodes in the network and 
whose edge set is £k . 

On a notational remark, throughout the paper we will use 
n to denote the number of nodes in the network (cardinality 
of V). 

Our method consists of a network anomaly detection 
stage and a botnet discovery stage (see Fig. [I]). In the 


network anomaly detection stage, each SIG is evaluated with 
a reference model and abnormal SIGs are stored into a pool 
A. The botnet discovery stage is triggered whenever the 
size of the pool A is greater than a threshold p. A set 
of highly interactive nodes, referred to as pivotal nodes , 
are identified. Both botmaster and targets are very likely 
to be pivotal nodes because they need to interact with bots 
frequently. These interactions correspond to C&C traffic for 
botmasters and to attacking traffic for targets. In either case, 
the interactions between each bot and pivotal nodes should 
be correlated. To characterize this correlation, we construct 
a Social Correlation Graph (SCG), whose formal definition 
is in Section IV-B.l We can detect bots by detecting the 
community that has high interaction with pivotal nodes in 
the SCG. We propose a novel community detection method 
based on a refined modularity measure. This modularity 
measure uses information in SIGs, i.e., pivotal interaction 
measure (see Section[TV-B.3|), to improve detection accuracy. 


Stage 1: Network Anomaly Detection Stage 2: Botnet Discovery 



Fig. 1. Overview of Our Method. 


III. Network Anomaly Detection 

As noted above, the goal of the network anomaly detection 
stage is to identify abnormal SIGs given some knowledge 
of what constitutes “normal” interactions between nodes. 
A natural way is to monitor the degree distributions of 
graphs and to compare them with appropriate reference 
graph models. This paper focuses on the Erdos-Renyi (ER) 
model, the most common type of random graph models. 
Our approach, however, can be generalized to more types 
of models. We apply composite hypothesis testing to detect 
abnormal graphs. 

A. Large Deviation Principle for ER Random Graphs 

First, we present a Large Deviation Principle (LDP) for 
undirected random graphs. Let Q n denote the space of all 
simple labeled undirected graphs of n vertices. For any graph 
Q G Qn , let d = (di,...,d n ) denote the labeled degree 
sequence of Q. Also let m = - Y^j =1 denote the number 
of edges in Q. We assume that any two nodes are connected 
by at most one edge, which means that the node degree in Q 
is less than n. For 0 < i < n — 1, let hi = J2j=i 1 ( 4 / = i) 


be the number of vertices in Q of degree i, where l(-) is 
the indicator function. Henceforth, h = • • •, h n - 1 ), a 

quantify irrelevant to the ordering of vertices, will be referred 
to as the degree frequency vector of a graph Q. The empirical 
distribution of the degree sequence d, defined by pi^ n \ is a 
probability measure on No = N U {0} that puts mass hi/n 
at i, for 0 < i < n — 1 . 

In the Erdos-Renyi model, G(n,p), the distribution of 
the degree of any particular vertex v is binomial. Namely, 
P(d v = k) = ( n ^ 1 )p /c (1 — p) n_1_/c , where n is the 
total number of vertices in the graph. It it well known that 
when n -A oc and np is constant, the binomial distribution 
converges to a Poisson distribution. Let (3 = np denote 
the constant. Then in the limiting case, the probability that 
the degree of a node equals k is pp,k — ^-fj—, which is 
independent of the node label. Let = (p^o^ • • • ?P/3,oo) be 
the Poisson distribution viewed as a vector whose parameter 
is /3. 

Let P(No) be the space of all probability measures 
defined on No- We view any probability measure /x G 
P(No) as an infinite vector /x = (/xo,..., Poo)- Let S = 
{/x G P (No) : /x := W < °°} be the set of all proba¬ 
bility measures on No with finite mean. It is easy to verify 
that p /3 G S. Let P n denote the Erdos-Renyi distribution on 
the space Q n with parameter fi/n. 

The so-called rate function I : S [—oo, oo] can be used 
to quantify the deviations of with respect to a random 
graph model ([13], [14]). For the ER model, [13] proposes 
the following rate function. 

Definition 2 

For the ER model with parameter /3 for its degree distribu¬ 
tion, we could define the rate function Ier • S -A [—oc,oo] 
as 

Ier (m;/3) = D(h\\ p 0 ) + 1 [fi - /3) + | log/3 - | log/2, 

where D (fj, \\ pp) = Ei hi lo S (^ 7 ) is the Kul1 ' 
back-Leibler (KL) divergence of pi with respect to p^. 

[13] further establishes an LDP for /i^ with this rate 
function. In the interest of space, we will not provide a formal 
statement of the LDP. Intuitively, when n is large enough, the 
empirical degree distribution behaves as P n (/x^ n ^ /x) x 

e —nlER(Wi(3) 

B. A Formal Anomaly Detection Test 

In this section, we consider the problem of evaluating 
whether a graph Q is normal, i.e., comes from the ER model 
with a certain set of parameters ( TLq ). Let pig be the empir¬ 
ical degree distribution of the graph Q and let Ier (pig] fi) 
(cf. Def. [2]) be the corresponding rate function. We present 
the following statement of the generalized Hoeffding test for 
this anomaly detection problem. 

Definition 3 

The Hoeffding test [15] is to reject Ho when Q is in the set: 

s* F = {g\i ER (v e -,p)>\}, (i) 





























where A is a detection threshold. 

It can be shown that the Hoeffding test 0 satisfies the 
Generalized Neyman-Pears on (GNP) criterion [14]. 

IV. Botnet Discovery 

The network anomaly detection technique in the previous 
section can only report an alarm when a botnet exists. 
In order to learn more about the botnet, we develop the 
botnet discovery technique described in this section. The first 
challenge for botnet discovery is that a single abnormal SIG 
is usually insufficient to infer complete information about a 
botnet, including the botmasters and the bots in the botnet. 
As a result, we monitor windows continuously and store all 
abnormal SIGs in a pool A. The botnet discovery stage is 
triggered only when \A\ >p. 

A. Identification of Pivotal Nodes 

We assume a sequence of abnormal SIGs A = 
{Si, • • •, G\a\ }• Detecting bots directly is non-trivial. In¬ 
stead, detecting the leaders (botmasters) or targets is much 
simpler because they are more interactive than normal nodes. 
Botmasters need to “command and control” their bots in 
order to maintain the botnet, and bots actively interact with 
victims in typical DDoS attacks. Both leaders and targets, 
henceforth referred to as pivotal nodes , are highly interactive. 
Let G ^ be an indicator of edge existence between node i and 
j in Qk- Then, for i = 1,..., n, 

\A\ n 

e < = nr EE 0 ? ® 

^ k=l j=l 

represents the amount of interaction of node i with all 
other nodes in A. Henceforth, is referred to as the total 
interaction measure of node i. We present the following 
definition of pivotal nodes. 

Definition 4 

(Pivotal nodes). We define the set of pivotal nodes J\f = 
{i : > r}, where r is a threshold. 

After identifying pivotal nodes , the problem is equivalent to 
detecting the community associated with pivotal nodes. 

B. Botnet Discovery 

1) Construction of the Social Correlation Graph: Com¬ 
pared to similar approaches in community detection, e.g., the 
leader-follower algorithm [16], our method takes advantage 
of not only temporal features (SIG) but also correlation 
relationships. These relationships are characterized using a 
graph, whose definition is presented next. 

For i = 1,... ,n, let variable 2Q represent the number 
of pivotal nodes in J\f that node i has interacted with. 
Let p(Xi,Xj) be the sample Pearson correlation coefficient 
between two random variables 2Q and Xj. In addition, if the 
sample standard deviation of either X{ or X 3 equals zero, 
we let p ( Xi,Xj ) = 0 to avoid division by zero. 


Definition 5 

(Social Correlation Graph). The Social Correlation Graph 
(SCG) C = (V, S c ) Is an undirected graph with vertex set V 
and edge set £ c = {(i, j) : \p (2Q, Xj)\ > r p }, where r p is 
a threshold. 

Because the behaviors of the bots are correlated, they are 
more likely to be connected to each other in the SCG. Our 
problem is to find an appropriate division of the SCG to 
separate bots and normal nodes. Our criterion for “appropri¬ 
ate” is related to the well-known concept of modularity in 
community detection [17], [18], [19]. 

2) Modularity-based Community Detection: The problem 
of community detection in a graph amounts to dividing the 
vertices of a given graph into non-overlapping groups such 
that connections within groups are relatively dense while 
those between groups are sparse [18]. 

The modularity for a given subgraph is defined to be the 
fraction of edges within the subgraph minus the expected 
fraction of such edges in a randomized null model. Although 
it was proposed as the stopping criterion of a method, this 
measure later inspired a broad range of community detection 
methods named modularity-maximization methods. 

We consider the simple case when there is only one botnet 
in the network. As a result, we want to divide the nodes 
into two groups, one for bots and one for normal nodes. 
Suppose that is variable such that = 1 if node i is a 
bot and Si = — 1 otherwise. Let d\ be the degree of node i 
in SCG C = (V, £ c ) for i = 1,..., n and let m c = \ JE d c { 
be the edge number of C. For a partition specified by s = 
(si,..., s n ), its modularity is defined as in [18] 

0(s)=Ty {Aii - Nii) 5 ( 8i , sj ), (3) 

i,j =1 

where S (si,Sj) = | (siSj + 1 ) is an indicator of whether 
node i and node j are of the same type. A i3 = 
1 (|p (Xi,Xj) | > t p ) is an indicator of the adjacency of node 
i with node j. Nij is the expected number of edges between 
node i and node j in a null model. The selection of the null 

model is empirical, but the most common choice by far is the 

d c d c 

configuration model [20] in which The optimal 

division of vertices should maximize the modularity measure 

0 - 

3) Refined Modularity: We introduce two refinements 
to the modularity measure to make it suitable for botnet 
detection. First, intuitively, bots should have strong inter¬ 
actions with pivotal nodes and normal nodes should have 
weak interactions. We want to maximize the difference. As 
a result, our objective considers nodes’ interaction to the 
pivotal nodes. Let 

r * = rv E E ^ ( 4 ) 

|tA| k=ijeJV 

denote the amount of interaction between node i and pivotal 
nodes. We refer to as pivotal interaction measure of node 
i. Then, JE quantifies the difference between the pivotal 


interaction measure of bots and that of normal nodes. A 
natural extension for the modularity measure is to include 
an additional term to maximize JA r i s i- 

Second, the modularity measure is criticized to suffer 
from low resolution, namely it favors large communities and 
ignores small ones [21], [22]. The botnet, however, could 
possibly be small. To address this issue, we introduce a 
regularization term for the size of botnets. It is easy to obtain 
that 1 (si = 1 ) = ^ 2 ^ is ^e number of detected 

bots. Thus, our refined modularity measure is 

1 ( d^d c - \ 

Qd(s) = 2^ ^ [ Aij ~ 2^) SiSj 

i,jev v 7 

+w\ riSi ~ w2 Y1 - 2 1 ^ 

i i 

where w\ and are appropriate weights. 

The two modifications also influence the results of isolated 
nodes with degree 0, which possibly exist in SCGs. By 
Def. [5] a node is isolated if its sample deviation is zero 
or its correlations with other nodes are small enough. The 
placement of isolated nodes, however, does not influence the 
traditional modularity measure, resulting in arbitrary commu¬ 
nity detection results [18]. This limitation is addressed by the 
two additional terms. If node i is isolated and r* = 0, then 
Si = — 1 in the solution because of the regularization term 
w< 2 • On the contrary, if r* is large enough, Si = 1 

in the solution because of the term w\ JA r^. 

C. Relaxation of the Optimization Problem 

The modularity-maximization problem has been shown as 
being NP-complete [23], [24]. The existing algorithms for 
this problem can be broadly categorized into two types: ( i) 
heuristic methods that solve this problem directly [25], and 
(ii) mathematical programming methods that relax it into an 
easier problem first [23], [26]. We follow the second route 
because it is more rigorous. 

We define the modularity matrix M = {M^}A. =1 , 

where My = ^ - Let s = (si,...,s„) and 

r = (ri,..., r n ), then the modularity-maximization problem 
becomes 

/ / / W 2 ' \ 

max s Ms + ywir -— 1 J s ( 6 ) 

s.t. = 1, Vi. 

To make the objective function concave, we introduce a 
negative multiple of s Is [26], leading to: 

max s (M — crl) s + (wir — ~^1 ) s (7) 

s.t. s^ = 1, Vi, 

where cr is a positive scalar. Notice that the objective of ([7]) is 
equivalent to that of ^ because s Is = ns 2 = n is ensured 
by the constraint. We can choose cr large enough so that 
M — crl is negative definite. This modification induces no 
extra computational cost. Although the feasible domain of the 
revised problem is still non-convex, the objective is concave 


now. 0 is a typical non-convex Quadratically Constrained 
Quadratic Programming (QCQP) [27]. Let S = ss , P 0 = 
M — crl, and qo = wir — ^-1. We can relax problem 0 to 


max 

s.t. 


Tr(SPo) +qoS 
"S si „ 

.s' ij - °> 

Su = 1, Vi 


( 8 ) 


The problem above is a Semidefinite Programming problem 
(SDP) and produces an upper bound on the optimal value 
of the original problem [27]. It is well known that SDP 
is polynomially solvable and many solvers (CSDP [28], 
SDPA [29]) are available. 

1) Randomization: The SDP relaxation ([5]) provides an 
optimal solution together with an upper bound on the optimal 
value of problem 0 . However, the solution of the SDP 
relaxation 0 may not be feasible for the original problem 
0 . To generate feasible solutions we use a randomization 
technique. 

If (S*, s*) is the optimal solution of the relaxed problem, 
then S* — s*s* can be interpreted as a covariance matrix. 
If we pick x = (xi,..., x n ) as a Gaussian random vector 
with x ^ A/"(s*, S* — s*s* ), then x “solves” the non-convex 
QCQP in 0 4 4 on average” over this distribution. As a result, 
we can draw samples x from this normal distribution and 
simply obtain feasible solutions by taking x = sgn(x). We 
sample 10,000 points and pick the point that maximizes 
/(x) = x' (M - crl) X + (w\T - if X. 


V. Experimental Results 

In this section, we apply our network anomaly detection 
approach to real-world traffic. Meanwhile, we also com¬ 
pare the performance of our botnet discovery approach, 
a modularity-based community detection technique, with 
existing community detection techniques. 


A. Description of Dataset 

In this paper, we mix some real-world botnet traffic 
with some real-world background traffic. For the real-world 
botnet traffic, we use the “DDoS Attack 2007” dataset 
by the Cooperative Association for Internet Data Analysis 
(CAIDA) [30]. It includes traces from a Distributed Denial- 
of-Service (DDoS) attack on August 4, 2007. The DDoS 
attack attempts to block access to the targeted server by con¬ 
suming computing resources on the server and by consuming 
all of the bandwidth of the network connecting the server to 
the Internet. 

The total size of the dataset is 21 GB and the dataset 
covers about one hour (20:50:08 UTC to 21:56:16 UTC). 
These dataset only contains attacking traffic to the victim; all 
other traffic, including the C&C traffic, has been removed by 
the creator of the dataset. The dataset consists of two parts. 
The first part is the traffic when the botnet initiates the attack 
(between 20:50 UTC and 21:13 UTC). In the initiating stage, 
the bots probe whether they can reach the victim in order 
to determine the set of nodes that should participate in the 
attack. The traffic of the botnet during this period is small, 







thus, it is very challenging to detect it using only network 
load. The second part is the attack traffic which starts around 
21:13 UTC when the network load increases rapidly (within 
a few minutes) from about 200 Kb/s to about 80 Mb/s. With 
this significant change of transmission rate, it is trivial to 
detect botnets when the attack starts (after 21:13 UTC). In 
this paper, we select a 5-minutes segment from the first part, 
i.e., during the time when the botnet initiates the attack. The 
total number of bot IP addresses in the selected traffic is 136. 

For the background traffic, we use trace 6 in the University 
of Twente traffic traces data repository (simpleweb) [31]. 
This trace was measured in a 100 Mb/s Ethernet link 
connecting an educational organization to the Internet. This 
is a relatively small organization with around 35 employees 
and a little over 100 students working and studying at this 
site (the headquarters of this organization). There are 100 
workstations at this location which all have 100 Mbit/s 
LAN connection. The core network consists of a 1 Gbit/s 
connection. The recordings took place between the external 
optical fiber modem and the first firewall. The measured link 
was only mildly loaded during this period. The background 
traffic we choose lasts for 3,600 seconds. The botnet traffic 
is mixed with background traffic between 2, 000 and 2, 300 
seconds. 

B. Results of Network Anomaly Detection 

We divide the mixed traffic into 10-second windows 
and create a sequence of 360 SIGs. Fig. shows the 
detection results. The blue “+” markers indicate the value of 
$) for each window i, i = 1,..., 360, where fi i is 
the empirical degree distribution of SIG i and /3 is estimated 
from the SIGs created using only background traffic. The 
red dash line shows the threshold A = 0.18, which can be 
set to constrain the false alarm rate below a desirable value. 
According to rule 0. there are 36 abnormal SIGs, namely 
\A\ = 36. There are 30 SIGs that have botnet traffic and 29 
SIGs are correctly identified. SIG no. 20 corresponding to the 
time range [20005, 20105] is missed. Being the start of the 
botnet traffic, this range has very low botnet activity, which 
may explain the miss-detection. In addition, there are two 
groups of false alarms—3 false alarms around 3,000s and 
4 false alarms around 3,500s. Fig. [2]-B shows the Receiver 
Operating Characteristic (ROC) curve of the detection rule 

0 - 

C. Results of Botnet Discovery 

The botnet discovery stage aims to identify bots based on 
the information in A. The first step is to identify a set of 
pivotal nodes. Recall that the total interaction measure 
in ([ 2 ]) quantifies the amount of interaction in A of node 
i with other nodes. The set of pivotal nodes is M = 
{i : Ci > t}, where r is a prescribed threshold. Let e max 
be the maximum total interaction measure of all nodes and 
gNorm _ {a/emax : i = 1 ,..., n} be the normalized set of 
total interaction measures. Fig. [3]plots S^ orrn in descending 
order and in log-scale for the y- axis. Each blue “+” marker 
represents one node. The blue curve in Fig. [3j being quite 


A 



false positive rate. 


Fig. 2. Figure A shows the rate function value lER{l^i \ p) for each win¬ 
dow i. The x-axis plots the starting time of each window. The background 
traffic lasts for 3,600 seconds and the botnet traffic is added between 2,000 
and 2,300 seconds. Figure B shows the ROC curve. The a>axis plots the 
false alarm rate and the y-axis the true positive rate. 



Fig. 3. Sorted amount of interaction in A defined by j2j. y-axis is in 
log-scale. 


steep, clearly indicates the existence of influential pivotal 
nodes. The red dash line in Fig. [3]plots the selected threshold 
r, which results in 3 pivotal nodes. Only one pivotal node 
belongs to the botnet. The other two pivotal nodes are 
active normal nodes. These two falsely detected pivotal 
nodes correspond to the two false-alarm groups described 
in Section IV-Bl 

Our dataset has 396 nodes, including 136 bots and 260 
normal nodes. Among the 396 nodes, only 213 nodes have 
positive sample standard deviations. Let V p be the set of all 
nodes with positive sample standard deviations, Fig. [4] plots 
the correlation matrix of these nodes. We can easily observe 
two groups from Fig. [4] 

We calculate the SCG C using Def. [5] and threshold r p = 
0.3. In the SCG C, there are 191 isolated nodes with degree 
zero. The subgraph formed by the remaining 205 nodes has 
two connected components (Fig. [5} A). Fig. [5]- A plots normal 
nodes as blue circles and bots as red squares. Although 
the bots and the normal nodes clearly belong to different 
communities, the two communities are not separated in the 
narrowest part of the graph. Instead, the separating line is 
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Fig. 5. Comparison of different community detection techniques on SCG. Fig. A shows the ground-truth communities of bots and normal nodes. Fig. 
B is the result of our botnet discovery approach. Fig. C is the result of the vector programming method proposed by Agarwal et al. [23]. Fig. D is the 
result of the walktrap method [32] with three communities. Fig. E is the result of Newman’s leading eigenvector method [19] with 3 communities. Fig. F 
is the result of the leading eigenvector method with 5 communities. In figure A-C, red squares are bots and blue circles are normal nodes. In figure D-F, 
red squares indicate the group with highest average pivotal interaction measure, while blue circles indicate the group with the lowest one. 



closer to the bots. 

We apply our botnet discovery method to C. The result 
(Fig. [5]-B) is very close to the ground truth (Fig. [5}A). 
As comparison, we also apply other community-detection 
methods to the 205-node subgraph. 

The first method is the vector programming method pro¬ 
posed by Agarwal et al. [23], which is a special case of 
our method in which w% = 0 and = 0. This approach, 
however, misses a number of bots ([5]-C). 

The second method is the walktrap method by Pascal 
et al. [32], [33], which defines a distance measure for 
vertices based on a random walk and applies hierarchical 
clustering [34]. When the desirable number of communities, 
a required parameter, equals to two, the method outputs the 
two connected components, a reasonable yet useless result 
for botnet discovery. To make the results more meaningful, 
we use walktrap to find three communities and ignore 
the smallest one that corresponds to the smaller connected 
components (right triangles in Fig. The community 


with a higher mean of pivotal interaction measure is detected 
as botnet, and the rest of the nodes are labeled as normal. 
The walktrap method separates bots and normal nodes in 
the narrowest part of the graph, a reasonable result from the 
perspective of community detection (Fig. [5]-D). However, a 
comparison with the ground-truth reveals that a lot of normal 
nodes are falsely reported as bots. 

The third method is the Newman’s leading eigenvector 
method [19], [33], a classical modularity-based community 
detection method. This method calculates the eigenvector 
corresponding to the second-largest eigenvalue of the mod¬ 
ularity matrix M, namely the leading eigenvector , and lets 
solution s be the sign of the leading eigenvector. The method 
can be generalized for detecting multi-communities [19]. 
Similar to the walktrap method, the leading eigenvector 
method reports two connected components as results when 
the desirable community number is two. We also use this 
method to find three communities and ignore the smallest 
one. Again, the community with higher mean of pivotal 
interaction measure is detected as the botnet. 

Different from previous methods, the eigenvector method 
makes completely wrong prediction of the botnet. The com¬ 
munity whose majority are bots (blue circles in Fig. [5]-E) 
is wrongly detected as the normal part and the community 
formed by the rest of the nodes is wrongly detected as the 
botnet. Despite being part of the real botnet, the community 
of blue circles in Fig. [5]-E actually has lower mean of pivotal 
interaction measure , i.e., less overall communication with 
pivotal nodes. 

After dividing the SCG C into five communities using the 
leading eigenvector approach for multi-communities [19], we 
observe that the botnet itself is heterogeneous and divided 
into three groups. Both the group with the highest mean of 
pivotal interaction measure (Group II in Fig. J51-F) and the 
group with the lowest mean (Group I in Fig. [5J-F) are part 






of the botnet. 

Because of the heterogeneity, some groups of the botnet 
may be misclassified. On the one hand, the leading eigen¬ 
vector method wrongly separates Group I from the rest as 
a single community, and merges Group II & IV with the 
normal part (Group III). Because Group I has the lowest 
pivotal interaction measure , it is wrongly detected as normal, 
causing Group II, III, IV to be detected as the botnet. On the 
other hand, the vector programming method wrongly detects 
a lot of nodes in Group II, which should be bots, as normal 
nodes. 

By taking the pivotal interaction measure into considera¬ 
tion, the misclassification can be avoided. In our formulation 
of refined modularity ([5]), the term w\ JV maximizes the 
difference of the pivotal interaction measure of the botnet 
and that of the normal part. Owing to this term, our method 
makes little mistake for nodes in Group II since they have 
high pivotal interaction measures. 

VI. Conclusion 

In this paper, we propose a novel method of botnet 
detection that analyzes the social relationships, modeled 
as Social Interaction Graphs (SIGs) and Social Correla¬ 
tion Graphs (SCGs), of nodes in the network. Compared 
to previous methods, our method has following novelties. 
First, our method applies social network analysis to botnet 
detection and can detect botnets with sophisticated C&C 
channels. Second, our method can be generalized to more 
types of networks, such as email networks and biological 
networks [35], [36]. Third, we propose a refined modularity 
measure that is suitable for botnet detection. The refined 
modularity also addresses some limitations of modularity. 

References 

[1] “DDoS Protection Whitepaper,” 2012, http://www.neustar.biz/ 
enterprise/resources/ddos-protection/ddos- attacks- survey- whitepaper# 
.UtwNR7Uo70o 

[2] N. Daswani and M. Stoppelman, “The anatomy of Clickbot.A,” in 
Proceedings of the first conference on First Workshop on Hot Topics 
in Understanding Botnet , 2007. 

[3] Z. Gyongyi and H. Garcia-Molina, “Web spam taxonomy,” in First 
international workshop on adversarial information retrieval on the 
web (AIRWeb 2005), 2005. 

[4] W. T. Strayer, R. Walsh, C. Livadas, and D. Lapsley, “Detecting botnets 
with tight command and control,” in Proceedings of 2006 31st IEEE 
Conference on Local Computer Networks. IEEE, 2006, pp. 195-202. 

[5] X. Su and D. Zhang, “Botnet detecting method based on clustering 
flow attributes of command and control communication channel,” 
Dianzi Yu Xinxi Xuebao(Journal of Electronics and Information Tech¬ 
nology), vol. 34, no. 8, pp. 1993-1999, 2012. 

[6] J. Binkley and S. Singh, “An algorithm for anomaly-based botnet 
detection,” Proceedings of USENIX Steps to Reducing Unwanted 
Traffic on the Internet Workshop (SRUTI), pp. 43-48, 2006. 

[7] J. Goebel and T. Holz, “Rishi: Identify bot contaminated hosts by IRC 
nickname evaluation,” in Proceedings of the first conference on First 
Workshop on Hot Topics in Understanding Botnets. Cambridge, MA, 
2007, p. 8. 

[8] G. Gu, J. Zhang, and W. Lee, “BotSniffer: Detecting botnet command 
and control channels in network traffic,” in Proceedings of 15th Annual 
Network and Distributed System Security Symposium, 2008. 

[9] Z. Bu, P. Bueno, R. Kashyap, and A. Wosotowsky, “The New Era 
of Botnets,” White paper from McAfee, 2010, https://www.botnets.fr/ 
images/b/b5/Wp- new- era- of- botnets .pdf 

[10] R. Lemos, “Bot software looks to improve peer-age,” 2006, http:// 
www.securityfocus.com/news/11390 


[11] A. Singh, “Social Networking for Botnet Command and Control,” 
Ph.D. dissertation, San Jose State University, 2012. 

[12] Y. Al-Hammadi and A. Abdulla, “Behavioural Correlation for Mali¬ 
cious Bot Detection,” Ph.D. dissertation, University of Nottingham, 
2010 . 

[13] S. Mukherjee, “Large deviation for the empirical degree distribution 
of an Erdos-Renyi graph,” arXiv preprint arXiv: 1310.4160, pp. 1-23, 
2013. 

[14] A. Dembo and O. Zeitouni, Large Deviations Techniques and Appli¬ 
cations, 2nd ed. Springer, 1998. 

[15] W. Hoeffding, “Asymptotically optimal tests for multinomial distribu¬ 
tions,” Anna Is of Mathematical Statistics, vol. 36, pp. 369-401, 1965. 

[16] D. Shah and T. Zaman, “Community detection in networks: The 
leader-follower algorithm,” arXiv preprint arXiv:1011.0774, pp. 1-13, 
2010 . 

[17] M. Newman, “Fast algorithm for detecting community structure in 
networks,” Physical Review E, vol. 69, no. 6, p. 066133, 2004. 

[18] -, “Detecting community structure in networks,” The European 

Physical Journal B - Condensed Matter, vol. 38, no. 2, pp. 321-330, 

2004. 

[19] -, “Finding community structure in networks using the eigenvec¬ 

tors of matrices,” Physical Review E, vol. 74, no. 3, p. 036104, 2006. 

[20] M. Molloy and B. Reed, “A critical point for random graphs with a 
given degree sequence,” Random structures and algorithms, vol. 6, no. 
2-3, pp. 161-180, 1995. 

[21] S. Fortunato and M. Barthelemy, “Resolution limit in community 
detection,” Proceedings of the National Academy of Sciences of the 
United States of America, vol. 104, no. 1, pp. 36-41, 2007. 

[22] A. Lancichinetti and S. Fortunato, “Limits of modularity maximization 
in community detection,” Physical Review E, vol. 84, no. 6, p. 066122, 
2011. 

[23] G. Agarwal and D. Kempe, “Modularity-maximizing graph communi¬ 
ties via mathematical programming,” The European Physical Journal 
B, vol. 66, no. 3, pp. 409-418, Nov. 2008. 

[24] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, 
Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEE 
Transactions on Knowledge and Data Engineering, vol. 20, no. 2, 
pp. 172-188, 2008. 

[25] J. Duch and A. Arenas, “Community detection in complex networks 
using extremal optimization,” Physical Review E, vol. 72, no. 2, p. 
027104, 2005. 

[26] E. Y. K. Chan and D.-Y. Yeung, “A convex formulation of modularity 
maximization for community detection,” Proceedings of the Twenty- 
Second international joint conference on Artificial Intelligence, vol. 3, 
pp. 2218-2225, 2011. 

[27] A. D’Aspremont and S. Boyd, “Relaxations and randomized methods 
for nonconvex QCQPs,” pp. 1-16, 2003, http://web.stanford.edu/class/ 
ee364b/lectures/OLDrelaxations.pdf 

[28] B. Borchers, “CSDP, A C library for semidefinite programming,” 
Optimization Methods and Software, vol. 1, no. 1, pp. 1-10, 1999. 

[29] K. Fujisawa, M. Kojima, K. Nakata, and M. Yamashita, “SDPA 
SemiDefinite Programming Algorithm,” Department of Mathematical 
and Computing Science, Tokyo Institute of Technology, Tech. Rep. 
B-308, 1995, http://www.is.titech.ac.jp/~kojima/articles^-308.ps.Z 

[30] “The CAIDA UCSD ”DDoS Attack 2007” Dataset,” CAIDA, 2013, 
http://www.caida.org/data/passive/ddos-20070804_dataset.xml 

[31] R. R. R. Barbosa, R. Sadre, A. Pras, and R. van de Meent, 
“Simpleweb/university of twente traffic traces data repository,” 
http://eprints.eemcs.utwente.nl/17829/, Technical Report TR-CTIT-10- 
19, April 2010. 

[32] P. Pons and M. Latapy, “Computing communities in large networks 
using random walks,” Computer and Information Sciences-ISCIS 2005, 

2005. 

[33] G. Csardi and T. Nepusz, “The igraph software package for complex 
network research,” InterJournal, Complex Systems, vol. 1695, no. 5, 

2006. 

[34] J. H. Ward Jr, “Hierarchical grouping to optimize an objective func¬ 
tion,” Journal of the American Statistical Association, vol. 58, no. 301, 
pp. 236-244, 1963. 

[35] M. Newman, S. Forrest, and J. Balthrop, “Email networks and the 
spread of computer viruses,” Physical Review E, vol. 66, no. 3, p. 
35101, 2002. 

[36] M. Newman, Networks: an introduction. Oxford University Press, 
2009. 


